Linguistic Constraints on Statistical Word Segmentation: The Role of Consonants in Arabic and English.

نویسندگان

  • Itamar Kastner
  • Frans Adriaans
چکیده

Statistical learning is often taken to lie at the heart of many cognitive tasks, including the acquisition of language. One particular task in which probabilistic models have achieved considerable success is the segmentation of speech into words. However, these models have mostly been tested against English data, and as a result little is known about how a statistical learning mechanism copes with input regularities that arise from the structural properties of different languages. This study focuses on statistical word segmentation in Arabic, a Semitic language in which words are built around consonantal roots. We hypothesize that segmentation in such languages is facilitated by tracking consonant distributions independently from intervening vowels. Previous studies have shown that human learners can track consonant probabilities across intervening vowels in artificial languages, but it is unknown to what extent this ability would be beneficial in the segmentation of natural language. We assessed the performance of a Bayesian segmentation model on English and Arabic, comparing consonant-only representations with full representations. In addition, we examined to what extent structurally different proto-lexicons reflect adult language. The results suggest that for a child learning a Semitic language, separating consonants from vowels is beneficial for segmentation. These findings indicate that probabilistic models require appropriate linguistic representations in order to effectively meet the challenges of language acquisition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic constraints on statistical computations: the role of consonants and vowels in continuous speech processing.

Speech is produced mainly in continuous streams containing several words. Listeners can use the transitional probability (TP) between adjacent and non-adjacent syllables to segment "words" from a continuous stream of artificial speech, much as they use TPs to organize a variety of perceptual continua. It is thus possible that a general-purpose statistical device exploits any speech unit to achi...

متن کامل

The tÜBITAK-UEKAE statistical machine translation system for IWSLT 2009

We describe our Arabic-to-English and Turkish-to-English machine translation systems that participated in the IWSLT 2009 evaluation campaign. Both systems are based on the Moses statistical machine translation toolkit, with added components to address the rich morphology of the source languages. Three different morphological approaches are investigated for Turkish. Our primary submission uses l...

متن کامل

The Role of Ethnicity in Integrative Tests Performances of Male/ Female Iranian English Learners of Different Language Proficiency Levels

Linguistic/cultural differences of learners’ native language with English as a foreign language, gender and English proficiency level are among those numerous variables which affect English learning and its quality in Iranian context. The present study was an attempt to illuminate the effects of these variables on performing integrative approach of general English tests (cloze test and recall t...

متن کامل

Linguistic tuple segmentation in ngram-ba

Ngram-based Statistical Machine Translation relies on a standard Ngram language model of tuples to estimate the translation process. In training, this translation model requires a segmentation of each parallel sentence, which involves taking a hard decision on tuple segmentation when a word is not linked during word alignment. This is especially critical when this word appears in the target lan...

متن کامل

VOT production in Stop Consonants in English-Arabic Bilingual Children

This study investigates the Voice Onset Time (VOT) of stop consonant production in six bilingual English-Arabic children in order to examine whether bilingual children possess one unitary or two separate linguistic systems. A total of six English-Arabic bilingual children participated ages 5 to 10. English and Arabic stop consonants followed by a vowel /a/ made by bilingual children were measur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Cognitive science

دوره   شماره 

صفحات  -

تاریخ انتشار 2017